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Abstract 



We consider the problem of minimal correction of the training set to make it 
consistent with monotonic constraints. This problem arises during analysis of data 
sets via techniques that require monotone data. We show that this problem is NP- 
I hard in general and is equivalent to finding a maximal independent set in special 

' orgraphs. Practically important cases of that problem considered in detail. These 

are the cases when a partial order given on the replies set is a total order or has a 
dimension 2. We show that the second case can be reduced to maximization of a 
quadratic convex function on a convex set. For this case we construct an approximate 
polynomial algorithm based on convex optimization. 

Keywords: machine learning, supervised learning, monotonic constraints. 



: 1 Introduction 

^ ■ Requirements to a classifying rule in supervised learning problems consist of two parts. 

The first part is induced by a set of precedents, called the training set. Each element in 
the training set is a pair of "object-reply" type. A classifying rule which is a mapping 
> I from objects set to the replies set should map objects from the training set pairs to the 

consistent replies. And the second part of requirements express our common knowledge 



■ of a classifying rule. One of the popular types of such requirements is the monotonicity 



which is considered in that paper. In some cases these two parts of requirements can not 
be satisfied both and then we have a problem of a minimal correction of the training set. 



o 

I Let us see what that problem is. 

O ' Suppose the sets X,Y are given and on this sets we have partial orders >"^, >^ 

consistently. We assume more that the partial order >^ is a lattice. For any given 
^ . mapping o : X' Y where X' C X, |X'| < oo we pose a problem of finding a function 

^ ! f : X ^ Y which is monotone due to partial orders >^ and minimizes the following 

functional: Ero (f) = \{x\f (x) ^o{x)}nX'\. 

Let us denote the set of monotonic functions from X to F by M (>"^, • Then for 
a given mapping o : X' — > F our task is the following: 

Evn ( f) min 

feM(>x,>Y) 

Every mapping f : X' ^ Y which is monotone on the subset X' C X can be extended 
to the mapping monotone on the whole set X because (y, is a lattice. Actually on 

every finite subset of the lattice (y, the operation sup is defined and the function 

/ (x) = sup |/' {x') \x' G X', x' <^ x| is both monotone and satisfies / (x) = f (x) ,x G 
X'. From this we see that in the posed problem we can imply that X' = X. From the 
above said we conclude more that this problem is equivalent to finding a maximal subset 
X" C X' such that the function o restricted on the subset X" is monotone. 

So let us consider the following generalization of our problem which we will call 
MaxCMS (Maximal Consistent with Monotonicity Set). 



MaxCMS. The finite sets Bn,Bm wliere Br = {1, ■■■,r} are given; on each of them 
partial orders >^ are defined consistently and the function (p : Bn ^ B^ is given. 
Then every element i & Bn is assigned by a positive integer weight Wi. Our task is to find 
a maximal by weight subset B C B^ such that the function (p restricted on B is monotone 
i.e. eB[t>' j-^ip{t) >%-(j)] • 

Definition 1. The set B C B^ is called acceptable iff the function ip restricted on B 
is monotone. 

Definition 2. A set which is acceptable and maximal by weight is denoted by 
MaxCMS {>^, >^,(p,w){m some cases we use this notation to mean the weight of this 
set). 

In the remainder of the paper we will consider that problem. 

2 Training set monotonization and maximal indepen- 
dent sets 

In this section we will show that MaxCMS is equivalent to finding a maximal independent 
set (or minimal vertex cover) in special orgraphs. 

Definition 3. Let G — (V, E) be an orgraph and every vertex v of an orgraph has 
a positive integer weight w^. A set of vertexes is called independent iff every pair of its 
elements is not connected by an edge. The maximal by weight independent set is denoted 
by IS {G, w) (in some cases we use this notation to mean the weight of this set). 

As well-known, the supplement of independent set is vertex cover. 

Let us define the following partial preorder on B^ (recall, that it means transitive and 
refiexive binary predicate): 

i ^ i ^ (i) >2 (j) . 

Consider the orgraph G = {V, E) with V = B^ and E = \i >^ j, (fi {i) f (j)}. 

The orgraph G can alternatively be defined through the following equalities: V = Bn and 
E —>^ n>^ where >^ is a supplement of the binary predicate. 

Definition 4. An orgraph which has the edge set represented as a intersection of a 
partial order and a supplement of a partial preorder is called special. 

Theorem 1. The maximal acceptable set is equal to the maximal independent set 
of the special orgraph G, i.e. MaxCMS (>\ >^, ip, w) = IS {G, w). 

Proof. Any independent set B of the orgraph G satisfies the condition: if i, j G B and 
i >^ j then (p (i) >^ ip (j), i.e. the function ip restricted on B is monotone. The inverted 
statement is correct also: if restriction of ip on B is monotone then B is an independent 
set in G. From this wc obtain the proposition of the theorem. 

Theorem 2. Let the special orgraph C be defined by the vertex set V = Bn with 
weights wl and the edge set E' —>' fly'; both >' and are given (i.e. the edge set E' 
need not be decomposed). Then the problem of finding maximal independent set in such 
an orgraph polynomially reducible to MaxCMS. 

Proof. Let us divide the set V on the equivalence classes due to predicate x y 
X >-' y^y >-' X. Then we can naturally define the corresponding mapping p' -.V ^V/ r^. 
On the factor-set V/ ~ it is induced the partial order x >" y ^ x y. It is easy to see 
that IS {C, w') = MaxCMS (>', >", ip', w'). The reduction is done in O (n^) steps. 
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3 NP-hardness of MaxCMS. 



In the previous chapter it was shown that MaxCMS is equivalent to finding maximal 
independent set (or minimal vertex cover) in special orgraphs. The problem of finding an 
acceptable set of cardinality more than C is denoted by CMS. Obviously, it is in NP. 
Theorem 3. CMS is NP-complete. 

Proof. Let us reduce CMS to 3-SAT using the trick from [2]. 

Let 3-CNF be given with U = {ui, .. .,«„,} being the set of variables used in it. Let 
C = {ci, Cm} be the set of clauses such that each clause consists of 3 literals that differ by 
their variables (literal is symbol Ui or ul). For every clause we order literals that belong to 
it. Then the fact of meeting the literal / on the s-th place in the clause is denoted by Icf.. 
Let us consider the orgraph such that its vertex set is a union of all literals and threefold 
copies of clauses V = {ui,Ui..., Un, u^}U{c\, cf, cf , c^, c^, c^}. Let us define the edge set 
being equal to E = E1UE2 where Ei = { (wj, 117) 1^=1 U { (^^fc, c^) |ufccj„} U | (^c^, li^) |lt^c^} 
and E2 = {(c],Cj^ , (^Cj,c^^ , (c],Cj^|. ^ (later we will need this division of the edge set 
on 2 subsets). 

A vertex cover of the orgraph G = (V, E) of the cardinality n + 2m exists iff the 
original 3-CNF is satisfiable. Actually, one from every pair of vertexes Ui,Ui and two from 
every triple c], cj, c| should fall into the vertex cover, because they are pairwise connected. 
And so, the cardinality of a vertex cover is not less than n + 2m. 

Suppose the vertex cover of the required cardinality exists. If the literal Ui is in it we 
define Ui = true, otherwise Ui = false. All variables should be initialized in this manner, 
because from the above said it is clear that Ui or ul is in the cover excluding both of 
them. Then this assignment, as easily seen, satisfies the original 3-CNF. This reasoning 
can be inverted and we obtain that the existence a satisfying assignment is equivalent to 
the existence of a vertex cover of the cardinality n + 2m. 

Now let us consider the orgraph G' = (V, E*\E) where E* is a transitive closure 
of E. Suppose that the edge set of G' is transitive. Then defining >= E* and >~= 
E*\E we obtain that > fl'^ = E. This means that our problem is reduced to finding 
the minimal vertex cover, and consequently, the maximal independent set of the special 
orgraph G = {V,E), which is by theorem 2 is equivalent to MaxCMS, or CMS when 
G = 2n + 3m — {n + 2m) = n + m. 

Let us show that the edge set of G' is transitive. As E* is transitive, E*\E is not 
transitive only if there exists such {u,v) , {v,t) G E*\E that {u,t) G E. Let {u,t) G 
{{ui,Ui)}^^i- It is easy to see that any path in the G starting with Ui can not end with 
literal Wi, because otherwise there should exist a clause that contains both Ui and Ui. Let 
us now consider the case when (m, t) G | (^c], cfj , (^cj, cfj , (^c], c^) |. In that case the path 

starting from c" and finishing in Cj can not contain an element which does not belong 

to {c],c^,c||. Consequently, {u,v) , (f,t) G E, which contradicts to {u,v) , (f,t) G E*\E. 

And the last case is when {u,t) G ^(uk,c[^ IwfcC^j. But every path in orgraph G which 
starts in u and finishes in t is equal to edge {u,t), and this means {u,v) ^ E*\E. In 
the same manner the case {u,t) G |(c^,Mfc) 1""^^^} is considered. So, the set E*\E is 
transitive and the reduction of 3-CNF to CMS is done. 
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4 1-MaxCMS 



Any partial order on a finite set can be represented as intersection of total orders. 

Definition. Let the partial order > be given on the set M. The minimal number d 
such that > is an intersection of total orders >i, >d, i.e. >=>i n...n >rf, is called the 
dimension of >. 

Consider MaxCMS with input (>^, ip, w) in case when the dimension of >^ is equal 
to d. In that case >^=>i n...n >d- The consistent special orgraph G = (y,E) satisfies: 

V = Bn and E =>^ Ciy where i y j 4^ i yi ik....k.i j and i j <^ ^ (i) >s ^ (j)- 
And then, 

E =>^ n^i n...n =>^ n (Flu ... uF^) = (>^ n^) u... u (>^ n^) . 

As each predicate is transitive, E is a union of d transitive predicates. 
Definition. The problem MaxCMS with input {>^,>'^,if,w) for case when the 
dimension of >^ is equal to d is called (i-MaxCMS. 
In fact, the above mentioned showed that 

Theorem 4. The problem (i-MaxCMS is reduced to finding the maximal independent 
set in the orgraph G = {V, E) where E U...U >~'^ and predicates are transitive and 
there are no cycles in G. 

From the theorem 4 we see that 1-MaxCMS is reduced to finding the maximal in- 
dependent set in the circuit-free orgraph G = (V, E) that has the edge set satisfying the 
following transitivity rule: if (m, v) , {v, t) E E then (m, t) G E. This problem is poly- 
nomially tractable because the graph that can be obtained from G by transformation of 
oriented edges to non-oriented is a comparability graph of some partial order which is 
known to be perfect. We will adduce one of the proofs of the tractability due to[l]. 

Theorem 5. 1-MaxCMS is polynomially tractable. 

Proof. Defining x > y {x,y) G E, the orgraph can be seen as partially ordered 
set {V,(>). The algorithm solves the problem via reducing it to the task of minimizing 
a flow in some circuit-free network. Let us denote the sets of minimal and maximal 
elements of (V,>) by minG and maxG consistently. For every vertex v E V of the 
orgraph G we introduce 2 copies v~^, v~ . And then we define V = {v~^, v~}^^y U {s, t} and 
E' = {{v+,v-)}^^y U I {x,y) e E}U {(s,a+) \a e minG} U \b e maxG}. 

We obtained the orgraph G' = {V',E'). The minimal flow through the edge {v~^,v~) is 
defined to be equal to the corresponding weights w^, and for other edges it equals 0. The 
maximal flow through every edge is oo. It is easy to see that for every edge e & E' of the 
orgraph G' we can find a path from s to t that goes through e. It is well-known that under 
that condition we can apply the min fiow-max cut theorem. 

The minimal flow of given network, that can be obtained via modified Ford- 
Fulkerson algorithm (common algorithm finds maximal flow), corresponds to the maximal 
W-cut (common algorithm finds minimal cut), where by the weight of a cut we mean the 
following expression: 

^ ^ Cjnin (c) ^ ^ Cijiax (^)- 

(u,v)£E,u£S,vGS {u,v)€E,v£S,u£S 

Note that the weight of a cut is defined differently from the sum of weights between 
parts of a cut and that is why we call the problem maximal W-cut. Consider any cut 

V = S U S where s E S,t & S with the weight different from — oo. Since maximal flow 
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through edges is oo, for every edge {u,v) e E' there can not he v E S,u E S. And edges 
{u,v) e E' ioT u e S,v e ^S" can make a contribution to the weight of a cut only when 
u — r'^,v — r~. Let us denote R — |r|r+ e 5", r~ e S^. Obviously, the elements of R 
constitute an independent set in G and the weight of a cut is exactly equal to the weight 
of the set. The conversion of the statement is also correct, i.e. every independent set R 
of G correspond to the cut S = {■u^,-u~|-u ^ RSz3r E R[r(>u\}U {r+|r G R} U {s}, the 
weight of a cut being equal to the weight of R. From this it is clear that the result of an 
algorithm will be the maximal cut that correspond to the maximal independent set in G. 
The theorem proved. 

The task of finding the minimal flow can be written in the LP form: 

x(r) > o,r eG(s,t) 
E a; (r) > 

X (r) — > min 

reG(s,t) 

where G (s, t) is a set of paths in orgraph G' = {V, E') from s to and G {v) <Z G (s, t) is 
a set of paths going through the edge (v"*", v~). In the dual form: 

y{v)>Q,veV 
E y{v)<l,VeG{s,t) 

E w^y {v) max 

vev 

From the above stated we conclude that the dual problem always has a boolean 
solution. Polyhedron of the dual problem is denoted by 11 {G). 

5 2-MaxCMS 

Now we will consider the problem 2-MaxCMS. This problem arise when a partial order 
on the replies set is not total, but, for example, has a tree structure. As we know, it can 
be reduced to finding the maximal independent set in the circuit-free orgraph G — (V, E) 
where E =>~^ U and the predicates are transitive. From now on we will consider 
just that problem. 

Note that edges of the circuit-free orgraph from theorem 3 are also divided on 2 sets 
El and £'2, both of them being transitive. From this we conclude that the problem is 

NP-hard. 

Consider 2 orgraphs: Gi = {V,y^) and G2 = {V,>~'^). Note that the maximal in- 
dependent set of the orgraph G = (V, E) is also an independent set in both Gi and G2- 
Then the following theorem is obvious. 

Theorem 6. The set of solutions to the following quadratic programming problem 

xeU{Gi) 
yeU{G2) 
ijj {x, y) = E WyXyyy max 

contains such boolean x*,y* that {v\xly* — 1} is the maximal independent set in G. 
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Proof. With fixed a; (fixed y) the maximum of X) w^x^yy is reached on some boolean 

vev 

^(boolean x). It means that the maximum by both vectors can be achieved with boolean 
values of components. 

Theorem 7. The following is true 

max lb (x, y) = max 7 (x, y) , 



where 



Proof. 

max WyXyyv = max (x„ + yv)'^ — Wv (xl + y^^ > 

xGUiG^),ymiG2) ^ " xGniG,),ym(G,) 2 ^ " ''V'' - 

> _ r^^^^^TTrn ,l '^v i^v + Vvf " '^v {x^ + Vv) 

Since maximum of the left part of inequality is achieved on boolean vectors, it is clear 
that the equahty holds. Taking into account that the functional 7 (x, y) is convex, we see 
that the problem was reduced to the maximization of a convex quadratic function on a 
convex set. 

Consider the functional 

V (X, y) = i^v - Vvf - Wv {X^ + Vv) 

Theorem 8. The following is true 

max ip ix, y) > max ib (x, y) , 
xeu{Gi),yen{G2) xen(Gi),yen(G2) 

the values of (p (x, y) and ■0 (x, y) being equal on the boolean vectors of the polyhedron 

xen(Gi),yGn(G'2) . 

Proof. The verification of the second statement is obvious. The first follows it, 
because the maximum of the right part by theorem 6 can be achieved on boolean vectors. 
Consider the following optimization task: 

a; e n(Gi) 
ye_n(G'2) 

(/? (x, y) max 

Let us call it as the convex task. 

Definition. The pair x* e IT {Gi) G 11 (G2) such that _ max 9? {x,y) — 

a;en(Gi),2/en(G2) 

if {x*,y*) < £ is called ^-solution of the convex task. 

Theorem 9. For every e the convex task can be s-solved in polynomial time. The 
length of an input is a sum of the lengths of descriptions of Gi = {V, )^^), G2 = {V, y^) 
and integer weights Wy. And obtained £-solution {x*,y*) satisfies \x* — y*\ < ^. 
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Lemma. The pair f°P^ = (a;°^* u^p*) = arg max ip(x,y) satisfies 



< 1 



2' 

2n 



opt opt 

Proof of lemma. Quadratic functional 99 ix, y) is not bounded in i?^" and its 
maximum on the set 11 (Gi) x 11(^2) is located on the borders of polyhedron. Let 
oT^^ < &i , ••• , ctf'i < hs be those inequalities from the definition of polyhedron 
that turn into equalities. Prom the optimality of ^x^p*, it is clear that the cone 

al^l < 0} n ... n \l\a;^l < 0} n fr| V^v? {^^l > o| = 0. And then, from theorem 



of Farkas-Minkovski, we conclude that 99 can be expanded on positive combina- 

tion of vectors oT, ...,07. But taking into account that components of those vectors arc 
positive we obtain that Vj^^^p {^) = \\wi{xf^ - y^* + - + \), - , 

Wn{xT - Vt + i), ^n(yj* - x'^' + > 0. Lemma proved. 

Proof of theorem. Since the function </? {x, y) is concave, the set of pairs 

X e n (Gi) 
y e n (G2) 

(f{x,y)>c 

-\<Xi-yi<\,i-^l,n 

is convex. 

Note that for every given vector pair x' ,y' the task of defining whether it belongs 
to the set 11 {Gi) x 11 (G2) or not can be solved in polynomial time. Actually, by Floid- 
Warshall algorithm we can find the longest path from s to i in orgraphs Gi and G2 
in polynomial time, where by length of a path we mean a sum of weights of vertexes 
on the path. Comparing the results with 1 wc see that if they are less than 1 then 
x\y' G n(G'i) X n(G'2). Besides, if x' ,y' ^ 11 (Gi) x 11 (G2) then the path which length 
is greater than 1 will give us a violated inequality in the definition of the polyhedron 

n(Gi)xn(G2). 

And for given x', y\ the satisfaction of conditions </? (x', y') > c, and in negative case, 
the separating hyperplane for the pair and the set {(^, y) |</'(x, y) > c + s} can be 
found in polynomial time. 

Actually, 

{(x, y) e n (Gi) X n (G2) | (v^^ (F, 7) , x - S^) + {y^^p (F, F) , y - 7) > 4 ^ 

D {(x, y) e n (Gi) x n (G2) \ip (x, y) > c + £} 

This can be seen from the following inequalities for concave quadratic function (/? and points 
(x, y), (x',y'^ such that ip(x,y) > c + s and ip(x',y'^ < c: £ < ip(x,y) —(p(^x',y'^ < 

Then rounding each component of vectors V^</? (x', y'^ and Vy-<^ (x', y') to the first 
2 (logn + I log el + 1) symbols in binary representation and denoting them as Cx and Cy, 
will give us the separating hyperplane 



I (a;, y) I (cx, x-x'^ + [cy, y~y') > || 
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According to|3j, in this case to find a pair x\y' that satisfies: 

e n (Gi) 
y'en(G2) 

if (x', y')>c 

-\ <x'i-y[<\,i = l,n 

can be done in polynomial time, or it will be shown that 

{(x, |x G n {Gi),y e n {G2),^{x,y) > c + -\ < Xi - yi < \,i = T~n] = 0. 

Taking into account that \(p {x,y)\ < 2 J2 Wy, by the method of binary division we 
find such a constant c, that the set 

Q = {(x,y) \x eU{Gi) ,y eU{G2) ,y^(x,y) >c,-\<Xi-yi< = T~H] ^ 

and 

{{x,y) |x G n (Gi) ,1/ G n (Ga) , F) > c + e, -| < a:^ - |/i < i, z = T~n] = 0. 



From lemma we see that G and < c + e. And every pair from is an 

e-solution of the task. Theorem proved. 

Consider the following approximate algorithm for 2-MaxCMS. 

1. Find a pair fx', y'] such that max oj (x, y) < u) (x' y'] + e |x' — y'A < ^ 

V '^y (x,|/)en(Gi)xn(G2) '^/-t-v '^y ^^ji - 2 

where e = t^. 

Id 

2. Find x* = arg max %p{x,y') and y* = arg max ip(x*,y). There x*,y* are 

sen(Gi) l/en(G2) 

boolean. 

The answer of an algorithm is the set of vertexes {f |x*y* = 1}. 
It is easy to see that all stages of the algorithm are polynomial. Let us investigate its 
answer. 



Denote W = J2 i^v and ip [x',y') = aW. It is clear that < a < 1. 
Theorem 10. The following is true 

(x,y)Gn(Gi)xn(G2)'^ ' r V y / ^4 V ) ) 

if a > |. And also 

max lb (x, y) — ib (x*, ?/*) < -W + e, 
(x,y)Gn(Gi)xn(G2)^^ '^^ 

when I < a < |. And 

max lb (x, y) — ib (x*, y*) < ( 7 — fa — ) + e, 
(x,y)Gn(Gi)xn(G2)^^ '^^ v^v ,y;-^4 V s; ; 



if a < f . 



Proof. Let us bound (p {x' , y'^ —ip {x' , y'^ , using the fact of concavity of / (x) = x— x^: 
V -^(x',y') = E (x; - xj^ + (y'^ - yj^ < 



v<^V 



1 _ x[,+y'^-l 
4 V 2 



When a > |: 
and from this: 

Then 



where t = min E w^t^. It is obvious that t = (a — ^) H^. So, we obtain 



a - W. 



Then using (x', y') > max ip (x, y) — s and tp {x*, y*) > t/j (x\ y') we finally 

^ ^ {x,y)en{Gi)xU{G2) ^ ' 

obtain: 

{a;,j/)en(Gi)xn(G2) V ^ 'J 

Almost analogous, when ct < |, 



vev 

> E + ^Wyx' - |iy 



and from this: 



Analogously, 



if (x', y') - ip (x', y') <^W -s, 
where s = min E Wyt^ = (a — |) W . And finally. 



max %l){x,y)-i) (x*, y*) < (/? (x', |/') - ip (x', y') + £ < 
The statement of the theorem in case when | < a < | is obvious. The theorem proved. 
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6 Conclusion 



As mentioned above, MaxCMS can be considered as a subcase for the vertex cover problem. 
From this point of view the task of finding MaxCMS is equivalent to the task of removing 
"noisy" objects from the training set with a minimal total weight. Let us compare the 
approximation ratio of our algorithm with a well-known, standard 2- approximation of 
vertex cover, that can be found for any graph with weighted vertexes in polynomial time[T]. 

It is clear that e can be made arbitrarily small and it does not play any role in the 
bound of theorem 10 because the bounded value is integer. So, for simphcity, we will 
believe that e = 0. Let us denote fx', y'] = aW > W — A = max ih (x, y). 

It is obvious that the ratio 2 of approximation has a meaning only if maximal con- 
sistent with monotonicity set of a special orgraph has a weight more than half of the sum 
of weights of all vertexes, i.e. a > a' = ^^"^^■^ — |- ^^is case from theorem 10 we 
obtain that: 

max ip (x, y) — ip (x*, y*) < ail — a)W < a\l — a')W = a' A 
(x,t7)en(Gi)xn(G2) 

which means that our algorithm has an approximation ratio equal to 1 + a' < 2. 

For "almost correct" data, i.e. when a' ~ 1, algorithm has an approximation ratio 
close to standard 2. But for "noisy" data it appears to be better than standard. For 
extreme case when a' ~ | standard 2-approximation means there is no guarantee that we 
will not remove all objects as "noise" . On the contrary, the total weight of objects removed 
by our algorithm in any case can not exceed optimal solution by more than ^W. And the 
bound of theorem 10 shows that our algorithm can find good approximations to MaxCMS 
for cases when even more than half of the training set consists of "noisy" data(Q; < |). 
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